Analysis of Mwes in Hindi Text Using Nltk
نویسندگان
چکیده
Natural Language Toolkit (NLTK) is a generic platform to process the data of various natural (human) languages and it provides various resources for Indian languages also like Hindi, Bangla, Marathi and so on. In the proposed work, the repositories provided by NLTK are used to carry out the processing of Hindi text and then further for analysis of Multi word Expressions (MWEs). MWEs are lexical items that can be decomposed into multiple lexemes and display lexical, syntactic, semantic, pragmatic and statistical idiomaticity. The main focus of this paper is on processing and analysis of MWEs for Hindi text. The corpus used for Hindi text processing is taken from the famous Hindi novel “KaramaBhumi by Munshi PremChand”. The result analysis is done using the Hindi corpus provided by Resource Centre for Indian Language Technology Solutions (CFILT). Results are analysed to justify the accuracy of the proposed work.
منابع مشابه
Stepwise Mining of Multi-Word Expressions in Hindi
Multi-word expressions (MWEs) play an important role in all tasks that involve natural language processing. MWEs in Hindi are quite varied and many of these are of the types that are not encountered in English. In this paper, we examine different types of MWEs encountered in Hindi. Many of these have not received adequate attention of investigators. For example, ‘vaalaa’ constructs, doublets (w...
متن کاملA System for Compound Noun Multiword Expression Extraction for Hindi
Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...
متن کاملMultiword Expressions Dataset for Indian Languages
Multiword Expressions (MWEs) are used frequently in natural languages, but understanding the diversity in MWEs is one of the open problem in the area of Natural Language Processing. In the context of Indian languages, MWEs play an important role. In this paper, we present MWEs annotation dataset created for Indian languages viz., Hindi and Marathi. We extract possible MWE candidates using two r...
متن کاملDetection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features
Detection of Multiword Expressions (MWEs) is a challenging problem faced by several natural language processing applications. The difficulty emanates from the task of detecting MWEs with respect to a given context. In this paper, we propose approaches that use Word Embeddings and WordNet-based features for the detection of MWEs for Hindi language. These approaches are restricted to two types of...
متن کاملDetection of Compound Nouns and Light Verb Constructions using IndoWordNet
Detection of MultiWord Expressions (MWEs) is one of the fundamental problems in Natural Language Processing. In this paper, we focus on two categories of MWEs Compound Nouns and Light Verb Constructions. These two categories can be tackled using knowledge bases, rather than pure statistics. We investigate usability of IndoWordNet for the detection of MWEs. Our IndoWordNet based approach uses se...
متن کامل